Exascale Ready Work-Optimal Matrix Inversion
نویسندگان
چکیده
In this thesis I present a new algorithm OPT for matrix inversion that builds on a matrix multiplication subroutine. It is combined of Strassen’s matrix inversion algorithm and Newton approximation. OPT overcomes the linear lower bound in parallel runtime of Strassen’s inversion algorithm and traditional Gaussian elimination without the log-factor more work of Newton approximation. In particular I prove, that given a work-optimal multiplication subroutine that runs in polylog time, OPT not only runs in polylog time, too, but furthermore only needs a constant factor more work than any work-optimal inversion algorithm. Additionally, I present a new stability result for Strassen’s matrix inversion algorithm combined with Newton approximation. As part of this thesis, I implemented OPT, Strassen’s inversion algorithm, and Newton approximation in a flexible test program along with a matrix container class optimized for this purpose. I describe the design of this implementation and the use and difficultys of BLAS for the matrix multiplication subroutine. In the experimental part, I compare the runtime of OPT to the routine included in the Intel Math Kernel Library (MKL) and observe its numerical stability. The constant factors on the amount of work of OPT show to be no larger than twice those of the MKL routine. As predicted, OPT shows to be very scaleable. Even on a computer with only eight cores it is already significantly faster than the MKL routine. Concerning numerical stability, OPT and Strassen’s algorithm do not live up to its bad reputation. Instead they produce results comparable to the MKL routine. I discover an unexpected instability of Newton approximation that makes it produce worse results than any other algorithm in the implementation. About this instability I present some further experiments.
منابع مشابه
CFD Vision 2030 Study: A Path to Revolutionary Computational Aerosciences
matrix inversion problems requires the development of algebraic multigrid methods (AMG). At the same time, improvements to current multigrid strategies are required if these methods are to scale effectively on emerging massively parallel HPC hardware. Although NASA investment in further research on multigrid methods has stalled since the early 1990’s, considerable research has been directed tow...
متن کاملNumerical inversion of Laplace transform via wavelet in ordinary differential equations
This paper presents a rational Haar wavelet operational method for solving the inverse Laplace transform problem and improves inherent errors from irrational Haar wavelet. The approach is thus straightforward, rather simple and suitable for computer programming. We define that $P$ is the operational matrix for integration of the orthogonal Haar wavelet. Simultaneously, simplify the formulaes of...
متن کاملPaving the Road to Exascale with Many-Task Computing
Exascale systems will bring significant challenges. This work attempts to addresses them through the Many-Task Computing (MTC) paradigm, by delivering data-aware job scheduling systems and fully asynchronous distributed architectures. MTC applications are structured as DAG graphs of tasks, with dependencies forming the edges. The asynchronous nature of MTC makes it more resilient than tradition...
متن کاملA Direct Matrix Inversion-Less Analysis for Distribution System Power Flow Considering Distributed Generation
This paper presents a new direct matrix inversion-less analysis for radial distribution systems (RDSs). The method can successfully deal with weakly meshed distribution systems. (WMDSs). Being easy to implement, direct methods (DMs) provide an excellent performance. Matrix inversion is the mean reason of divergence and low-efficiency in power flow algorithms. In this paper, the performance of t...
متن کاملCosmological Simulations in Exascale Era
The architecture of Exascale computing facilities, which involves millions of heterogeneous processing units, will deeply impact on scientific applications. Future astrophysical HPC applications must be designed to make such computing systems exploitable. The ExaNeSt H2020 EU-funded project aims to design and develop an exascale ready prototype based on low-energy-consumption ARM64 cores and FP...
متن کامل